Skip to content

[spark] Push down variant_get for Spark 4.0#7657

Closed
chenghuichen wants to merge 7 commits intoapache:masterfrom
chenghuichen:spark_variant
Closed

[spark] Push down variant_get for Spark 4.0#7657
chenghuichen wants to merge 7 commits intoapache:masterfrom
chenghuichen:spark_variant

Conversation

@chenghuichen
Copy link
Copy Markdown
Contributor

@chenghuichen chenghuichen commented Apr 15, 2026

Purpose

Queries like SELECT variant_get(v, '$.age', 'int') FROM T on a shredded Variant column still read all sub-columns and reassemble the full binary Variant, leaving Paimon's VariantRowType / clipVariantType infrastructure unused.

This PR adds PushDownVariantExtract (Spark 4 only), a Catalyst optimizer rule that replaces VariantGet with GetStructField and sets variantProjections on PaimonScan, so only the accessed typed_value.* Parquet sub-columns are read.

The rule runs in the "User Provided Optimizers" batch (via experimentalMethods.extraOptimizations) to ensure it fires after V2ScanRelationPushDown has built the scan relation.

Part of #4471

Note: Spark 4.0 lacks a V2-compatible variant push-down interface (SupportsPushDownVariantExtractions was introduced in 4.1), so registering a custom optimizer rule via experimentalMethods.extraOptimizations is the right fit for 4.0. For a future paimon-spark-4.1 module, a cleaner approach would be implementing SupportsPushDownVariantExtractions on PaimonScan and letting Spark's built-in V2ScanRelationPushDown handle the rewrite natively.

Tests

VariantTest.scala::VariantPushDownPlanTest (paimon-spark-4.0)

@JingsongLi
Copy link
Copy Markdown
Contributor

cc @Zouxxyy to take a look

@chenghuichen chenghuichen changed the title [spark] Push down variant_get into Paimon shredded Variant scan [spark] Push down variant_get for Spark 4.0 Apr 16, 2026
@chenghuichen
Copy link
Copy Markdown
Contributor Author

chenghuichen commented Apr 16, 2026

The variant_get pushdown approach is too hacky for Spark 4.0. The community has decided to target Spark 4.1 (and later) only. Close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants